Inside LLMs: Psychology & Limits, Prompts, GUI, and Agent Evals | Ep. 13

Update: 2025-06-25

Description

In this episode of the Agent Workforce Podcast, Kristiina and Niko reunite to discuss the latest in LLM trends, model chaos, and real-world AI adoption. From Karpathy’s Y Combinator summer school talk to what’s actually inside a system prompt, we cover a lot of ground — including energy use, pricing shifts, and the surprising rise of OpenEvidence as the go-to tool for U.S. physicians.

Plus: insights into Sam Altman’s comments on AI resource use, GPT-4 pricing shifts, and why evals are actually the unsung heroes of scalable AI solutions.

🔍 Topics include:

Karpathy’s Y Combinator summer school talk: “software is changing again”
The evolving GUI of LLMs & user autonomy
How much energy ChatGPT really uses
OpenEvidence is becoming the “search bar” for U.S. doctors
03 model price drops and selection confusion
Prompting at different levels (system, user, custom GPTs)
How Agent Workforce agents are built under the hood
The role of evals in building trustworthy agents
A quick summer sign-off and a shoutout to ⁠agentacademy.ai⁠

Comments

In Channel

First Thoughts on Antigravity – Google’s New Coding Agent and the Rise of Multimodal Models I Ep. 23

2025-12-0522:40

Episode 21: Special Episode: Inside the World of Antti Parviainen, AI Agent Developer | Ep. 21

2025-11-0831:54

Surfing with AI, Agent Builders are everywhere, will multimodal LLMs revolutionize IDP? | Ep. 20

2025-10-3127:25

Special Episode with Kseniia Palin: From AI to GenAI and engineering for agentic workloads | Ep 19

2025-10-0842:11

AI Agents in Enterprises, Global ChatGPT Usage & How Management Teams Use AI I Ep 18

2025-09-2630:37

Nano Banana – The End of Photoshop? MAI – Copilot’s Future & GenAI’s Top 50 | Ep. 17

2025-09-0526:59

Prompting GPT-5, the GenAI Divide, and Multi-Agent Innovations | Ep. 16

2025-08-2227:24

Season 2 Kickoff: GPT-5, World Models, Genie 3, Microsoft’s AI Impact Report & Agentic Updates | Ep. 15

2025-08-1533:44

GPT 5.1 follows instructions better, Agent Builder experiments, McKinsey’s AI wake-up call & AI-led cyber espionage jailbreaks Claude Code | Ep. 22

2025-11-1420:34

Season Finale: What We’ve Learned from Prompts to Orchestration I Ep. 14

2025-07-0431:44

Inside LLMs: Psychology & Limits, Prompts, GUI, and Agent Evals | Ep. 13

2025-06-2532:51

Claude 4 & ASL-3 Activated, Snitching LLMs, and Our Agentic Systems Webinar | Ep. 12

2025-06-0527:26

AI Goes to School, Shops with You, and Writes Your Code I Ep. 11

2025-05-2231:10

Special Episode with Karli Kalpala: Introducing a Vertical Agent for PMI Claims Processing | Ep. 10

2025-05-1542:39

Special Edition with AI Researcher Rami Luisto: LLM Research Highlights, OpenAI’s New Direction & Document Processing in Practice | Ep. 9

2025-05-0953:52

Going AI-First: Duolingo, highlights from Microsoft Work Index 2025, UiPath Agentic AI | Ep. 8

2025-04-3028:08

Shopify CEO's ‘Leaked’ AI Memo, OpenAI’s Memory Update, and Google’s NotebookLM | Ep. 7

2025-04-1126:26

Hackathon stories, Model Context Protocol, The next wave of image generation | Ep. 6

2025-04-0324:01

The Rise of Vibe Coding: AI’s Role in Software Creation | Ep. 5

2025-03-2121:32

Manus is Here – Thoughts on GPT-4.5, Distillation, and Agentic Orchestration | Ep. 4

2025-03-1330:09

00:00

1.0x

Inside LLMs: Psychology & Limits, Prompts, GUI, and Agent Evals | Ep. 13

Digital Workforce Services

#box-pro-ellipsis-176547659094036{-webkit-line-clamp:2;}Inside LLMs: Psychology & Limits, Prompts, GUI, and Agent Evals | Ep. 13

Inside LLMs: Psychology & Limits, Prompts, GUI, and Agent Evals | Ep. 13

Digital Workforce Services

Inside LLMs: Psychology & Limits, Prompts, GUI, and Agent Evals | Ep. 13